Yandex School of Data Analysis Russian-English Machine Translation System for WMT14
نویسندگان
چکیده
This paper describes the Yandex School of Data Analysis Russian-English system submitted to the ACL 2014 Ninth Workshop on Statistical Machine Translation shared translation task. We start with the system that we developed last year and investigate a few methods that were successful at the previous translation task including unpruned language model, operation sequence model and the new reparameterization of IBM Model 2. Next we propose a {simple yet practical} algorithm to transform Russian sentence into a more easily translatable form before decoding. The algorithm is based on the linguistic intuition of native Russian speakers, also fluent in English.
منابع مشابه
Yandex School of Data Analysis Machine Translation Systems for WMT13
This paper describes the English-Russian and Russian-English statistical machine translation (SMT) systems developed at Yandex School of Data Analysis for the shared translation task of the ACL 2013 Eighth Workshop on Statistical Machine Translation. We adopted phrase-based SMT approach and evaluated a number of different techniques, including data filtering, spelling correction, alignment of l...
متن کاملMachine Translation and Monolingual Postediting: The AFRL WMT-14 System
This paper describes the AFRL statistical MT system and the improvements that were developed during the WMT14 evaluation campaign. As part of these efforts we experimented with a number of extensions to the standard phrase-based model that improve performance on Russian to English and Hindi to English translation tasks. In addition, we describe our efforts to make use of monolingual English spe...
متن کاملYandex School of Data Analysis approach to English-Turkish translation at WMT16 News Translation Task
We describe the English-Turkish and Turkish-English translation systems submitted by Yandex School of Data Analysis team to WMT16 news translation task. We successfully applied hand-crafted morphological (de-)segmentation of Turkish, syntax-based pre-ordering of English in English-Turkish and post-ordering of English in Turkish-English. We perform desegmentation using SMT and propose a simple y...
متن کاملchrF: character n-gram F-score for automatic MT evaluation
We propose the use of character n-gram F-score for automatic evaluation of machine translation output. Character ngrams have already been used as a part of more complex metrics, but their individual potential has not been investigated yet. We report system-level correlations with human rankings for 6-gram F1-score (CHRF) on the WMT12, WMT13 and WMT14 data as well as segment-level correlation fo...
متن کاملSupertag Based Pre-ordering in Machine Translation
This paper presents a novel approach to integrate mildly context sensitive grammar in the context of pre-ordering for machine translation. We discuss the linguistic insights available in this grammar formalism and use it to develop a pre-ordering system. We show that mildly context sensitive grammar proves to be beneficial over context free grammar, which facilitates better reordering rules. Fo...
متن کامل